-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCS/UCP/RNDV: RNDV pipeline invalidation #10204
base: master
Are you sure you want to change the base?
Conversation
5670125
to
d9f1684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first bunch of comments
if (ucp_request_memh_check_invalidate(req, 0)) { | ||
ucp_request_memh_invalidate(req, status, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having one function looks cleaner to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it looks cleaner. But the purpose was to make it safe. Here is the existing code:
<< When ucp_request_memh_invalidate succeeds, it populates the `invalidate`
<< structure within request.send union, which invalidates all the other structures like `rndv`
<< Basically after invalidation you cannot use other structures within union
if (ucp_request_memh_invalidate(req, status)) {
<< This is unsafe, `rndv` struct is overwritten by this moment!
<< Well, it works by chance, because for now `invalidate` struct contains just one pointer,
<< so it does not overwrite much from `rndv`. With this commit we extend `invalidate` struct,
<< so it's not possible to access rndv after invalidation
if (req->send.rndv.rkey != NULL) {
ucp_proto_rndv_rkey_destroy(req);
}
ucp_proto_request_zcopy_id_reset(req);
return;
}
So the split is done, because we need to perform some cleanup tasks on invalidated requests, but essentially we can do that only before invalidation actually happens.
I can check whether we can move this conditional cleanup tasks before - in that can split is not needed.
params.flags = 0; | ||
} | ||
|
||
ucs_trace("de-registering %smemh[%d]=%p", (rkey_only ? "rkey of " : ""), | ||
md_index, memh->uct[md_index]); | ||
ucs_assert(context->tl_mds[md_index].attr.flags & UCT_MD_FLAG_REG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_assertv() and print the flags
src/ucp/core/ucp_mm.c
Outdated
static void ucp_memh_dereg_rkey(ucp_context_h context, ucp_mem_h memh, | ||
ucp_md_map_t md_map) | ||
{ | ||
ucs_trace("memh %p: invalidate only indirect rkeys: md_map %" PRIx64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove indirect, as it is UCT IB specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -798,7 +828,8 @@ ucp_memh_init_uct_reg(ucp_context_h context, ucp_mem_h memh, | |||
|
|||
cache_md_map = context->cache_md_map[mem_type] & reg_md_map; | |||
|
|||
if ((context->rcache == NULL) || (cache_md_map == 0)) { | |||
if ((context->rcache == NULL) || (cache_md_map == 0) || | |||
(uct_flags & UCT_MD_MEM_FLAG_NO_RCACHE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should not be a new UCT flag because it has no meaning in UCT and is only used in UCP
src/ucp/core/ucp_mm.c
Outdated
{ | ||
ucp_mem_desc_t *mdesc = ucp_worker_mpool_get(mpool); | ||
|
||
mdesc->memh->super.refcount ++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mdesc->memh->super.refcount ++; | |
mdesc->memh->super.refcount++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucs/datastruct/mpool.c
Outdated
ucs_mpool_add_to_freelist(mp, elem); | ||
} | ||
|
||
ucs_debug("mpool %s: added %u elements of chunk %p to the freelist", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use some other level, like trace.
debug is enabled in UCX release build and we do not add any traffic related traces to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/core/ucp_mm.c
Outdated
|
||
/* If we reach this point, it means that chunk is marked for invalidation, | ||
* and all of its inflight operations are completed. Now we can de-register | ||
* its indirect rkeys and revive the chunk = return it back to mpool. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove indirect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/core/ucp_mm.c
Outdated
* and all of its inflight operations are completed. Now we can de-register | ||
* its indirect rkeys and revive the chunk = return it back to mpool. */ | ||
ucp_memh_dereg_rkey(context, memh, memh->inv_md_map); | ||
ucs_assert(0 == memh->inv_md_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ucs_assert(0 == memh->inv_md_map); | |
ucs_assertv(0 == memh->inv_md_map, "inv_md_map=0x%lx", memh->inv_md_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
if (rkey_only) { | ||
params.flags |= UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass uct_flags
instead of rkey_only
and pass UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY
directly. This would simplify this code
@@ -1490,6 +1504,16 @@ uct_ib_mlx5_devx_mem_dereg(uct_md_h uct_md, | |||
return status; | |||
} | |||
|
|||
/* If requested invalidation of only rkeys, then return success */ | |||
flags = UCT_MD_MEM_DEREG_FIELD_VALUE(params, flags, FIELD_FLAGS, 0); | |||
if (flags & UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we always do invalidation of just rkey without introducing a new flag? Why do we need two flags now?
What
Invalidation for RNDV pipeline protocol based on put_mtype/rtr_mtype.
In the previous PR #10075, we advertised error handling capability for rndv_ppln over rnvd_(put/rtr)_mtype) so that this protocol can be selected.
Design document
Why ?
We want rndv_ppln over rnvd_(put/rtr)_mtype) protocol to be selected over transport with error handling, for performance reasons.
How ?